7 research outputs found

    On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions

    Get PDF
    d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions

    A Dyadic Simulation Approach to Efficient Range-Summability

    Get PDF
    Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees

    Rethinking Similarity Search: Embracing Smarter Mechanisms over Smarter Data

    Full text link
    In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a more comprehensive approach that also enhances the underpinning search mechanisms. We highlight three novel avenues that call for a redefinition of the similarity search problem: exploiting implicit data structures and distributions, engaging users in an iterative feedback loop, and moving beyond a single query vector. These novel pathways have gained relevance in emerging applications such as large-scale language models, video clip retrieval, and data labeling. We discuss the corresponding research challenges posed by these new problem areas and share insights from our preliminary discoveries

    RECIPE: Rateless Erasure Codes Induced by Protocol-Based Encoding

    Full text link
    LT (Luby transform) codes are a celebrated family of rateless erasure codes (RECs). Most of existing LT codes were designed for applications in which a centralized encoder possesses all message blocks and is solely responsible for encoding them into codewords. Distributed LT codes, in which message blocks are physically scattered across multiple different locations (encoders) that need to collaboratively perform the encoding, has never been systemically studied before despite its growing importance in applications. In this work, we present the first systemic study of LT codes in the distributed setting, and make the following three major contributions. First, we show that only a proper subset of LT codes are feasible in the distributed setting, and give the sufficient and necessary condition for such feasibility. Second, we propose a distributed encoding protocol that can efficiently implement any feasible code. The protocol is parameterized by a so-called action probability array (APA) that is only a few KBs in size, and any feasible code corresponds to a valid APA setting and vice versa. Third, we propose two heuristic search algorithms that have led to the discovery of feasible codes that are much more efficient than the state of the art.Comment: Accepted by IEEE ISIT 202
    corecore